Goto

Collaborating Authors

 preliminary evaluation


US regulators launch investigation into self-driving Teslas after series of crashes

The Guardian

The preliminary evaluation by NHTSA is the first step before potentially seeking a recall of the vehicles. The preliminary evaluation by NHTSA is the first step before potentially seeking a recall of the vehicles. US automobile safety regulators have opened an investigation into Tesla vehicles equipped with its full self-driving technology over traffic-safety violations after a series of crashes. The National Highway Traffic Safety Administration (NHTSA) said the electric carmaker's self driving assistance system, which requires drivers to pay attention and intervene if needed, had "induced vehicle behaviour that violated traffic safety laws". The preliminary evaluation by the NHTSA is the first step before potentially seeking a recall of the vehicles if it believes they pose a risk to safety.


US investigates 2.4m Tesla self-driving vehicles after reported collisions

The Guardian

The US government's road safety agency has opened an investigation into 2.4m Tesla vehicles with the automaker's Full Self-Driving software after four reported collisions, including a fatal crash. The National Highway Traffic Safety Administration (NHTSA) on Friday said it was opening the preliminary evaluation after four reports of crashes where Full Self-Driving was engaged during reduced roadway visibility like sun glare, fog or airborne dust. In one crash "the Tesla vehicle fatally struck a pedestrian. One additional crash in these conditions involved a reported injury," NHTSA said. The investigation covers 2016-2024 Model S and X vehicles with the optional system as well as 2017-2024 Model 3, 2020-2024 Model Y, and 2023-2024 Cybertruck vehicles.


Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems

arXiv.org Artificial Intelligence

Benchmarks are critical for measuring progress of math reasoning abilities of Large Language Models (LLMs). However, existing widely-used benchmarks such as GSM8K have been rendered less useful as multiple cutting-edge LLMs achieve over 94% accuracy. While harder benchmarks have been proposed, their creation is often manual and expensive. We present Scheherazade, an automated approach for producing challenging mathematical reasoning benchmarks by logically chaining mathematical reasoning problems. We propose two different chaining methods, forward chaining and backward chaining, which require reasoning forward and backward through the chain respectively. We apply Scheherazade on GSM8K to create GSM8K-Scheherazade and evaluate 3 frontier LLMs and OpenAI's o1-preview on it. We show that while frontier models' performance declines precipitously at only a few questions chained, a preliminary evaluation suggests o1-preview performance persists up to 5 questions chained backwards. In addition, while all other models perform worse when problems are chained backwards, o1-preview performs better on backward-chained benchmarks. We will release the dataset and code publicly.


GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

arXiv.org Artificial Intelligence

Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks. The results show that \gpt has high accuracy in facial action unit recognition and micro-expression detection while its general facial expression recognition performance is not accurate. We also highlight the challenges of achieving fine-grained micro-expression recognition and the potential for further study and demonstrate the versatility and potential of \gpt for handling advanced tasks in emotion recognition and related fields by integrating with task-related agents for more complex tasks, such as heart rate estimation through signal processing. In conclusion, this paper provides valuable insights into the potential applications and challenges of MLLMs in human-centric computing. Our interesting examples are at https://github.com/EnVision-Research/GPT4Affectivity.


THUIR2 at NTCIR-16 Session Search (SS) Task

arXiv.org Artificial Intelligence

Our team(THUIR2) participated in both FOSS and POSS subtasks of the NTCIR-161 Session Search (SS) Task. This paper describes our approaches and results. In the FOSS subtask, we submit five runs using learning-to-rank and fine-tuned pre-trained language models. We fine-tuned the pre-trained language model with ad-hoc data and session information and assembled them by a learning-to-rank method. The assembled model achieves the best performance among all participants in the preliminary evaluation. In the POSS subtask, we used an assembled model which also achieves the best performance in the preliminary evaluation.


A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

arXiv.org Artificial Intelligence

Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data, which has gained increasing attention. In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks including spoken language understanding (SLU) and dialogue state tracking (DST). Experimental results on four popular benchmarks reveal the great potential of ChatGPT for zero-shot dialogue understanding. In addition, extensive analysis shows that ChatGPT benefits from the multi-turn interactive prompt in the DST task but struggles to perform slot filling for SLU. Finally, we summarize several unexpected behaviors of ChatGPT in dialogue understanding tasks, hoping to provide some insights for future research on building zero-shot dialogue understanding systems with Large Language Models (LLMs).


Robofriend: An Adpative Storytelling Robotic Teddy Bear -- Technical Report

arXiv.org Artificial Intelligence

Language exposure at an early stage of development is critical for the facilitation of brain networks associated with language Kuhl [2004], Cardillo and Kuhl [2009], Moon et al. [2013]. Storytelling is one form of language exposure, which was found to be associated with a greater engagement not only in language processing but also in visualization and cognitive abilities in children Hutton et al. [2015]. Interestingly, it was suggested that it is not the storytelling itself that is related to these improvements, but it is the interaction during the stories that amplify these abilities in children Twait et al. [2019]. A recent study demonstrated how a group of 4-6-year-old children attending storytelling sessions interactively vs. a group attending non-interactively (storytelling sessions on the screen), shared greater cognitive and language abilities Twait et al. [2019]. Hence, a question was raised regarding this positive effect during interactive (dialogic) storytelling - is the positive effect due to the human interaction?


First Known Tesla Autopilot Death Spurs Federal Investigation

Popular Science

We learned yesterday evening that NHTSA is opening a preliminary evaluation into the performance of Autopilot during a recent fatal crash that occurred in a Model S. This is the first known fatality in just over 130 million miles where Autopilot was activated. Among all vehicles in the US, there is a fatality every 94 million miles. Worldwide, there is a fatality approximately every 60 million miles. It is important to emphasize that the NHTSA action is simply a preliminary evaluation to determine whether the system worked according to expectations. Following our standard practice, Tesla informed NHTSA about the incident immediately after it occurred.


Tesla's 'Autopilot' feature probed after fatal crash

USATODAY - Tech Top Stories

A preliminary investigation has begun for a fatal car crash involving a Tesla Model S.According to the National Highway Traffic Safety Administration, the electric model sedan had Autopilot mode engaged when a driver was killed. The National Highway Traffic Safety Administration has opened a preliminary evaluation into the fatal crash of a Tesla electric car that had its "Autopilot" feature engaged at the time of the incident. NHTSA says in its filing that the crash was reported by Tesla and that its probe, part of a process that can eventually lead to a recall, centers on the car's self-driving feature. "This preliminary evaluation is being opened to examine the design and performance of any automated driving systems in use at the time of the crash," the safety agency said in a filing. The crash occurred when a tractor-trailer made a left turn in front of a 2015 Tesla on a highway near Williston, Fla., NHTSA said. The driver died due to injuries sustained in the accident.


Fatal crash of Tesla Model S in autopilot prompts 'preliminary evaluation' by federal officials

Los Angeles Times

The National Highway Transportation Safety Board is opening a preliminary evaluation into Tesla's autopilot feature, after the fatal crash of a Model S that was in self-driving mode, the electric automaker said Thursday. According to a blog post from Tesla Motors Inc., the car was on a unnamed, divided highway when a tractor trailer drove across the road perpendicular to the Model S. "Neither Autopilot nor the driver noticed the white side of the tractor trailer against a brightly lit sky, so the brake was not applied," Tesla said in the post. The Model S passed under the trailer, with the bottom of the trailer impacting the windshield of the Model S, Tesla said. Tesla said this was the first fatality in which the autopilot feature was activated, with more than 130 million miles driven using that feature. The Palo Alto automaker said it informed NHTSA about the incident "immediately after it occurred."